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Abstract 

^ In this paper, we suggest a novel data hiding technique in an Html Web page. 

Html Tags are case insensitive and hence an alphabet in lowercase and one in 
*^ uppercase present inside an html tag are interpreted in the same manner by the 

browser, i.e., change in case in an web page is imperceptible to the browser. 
We basically exploit this redundancy and use it to embed secret data inside an 
web page, with no changes visible to the user of the web page, so that he can 
not even suspect about the data hiding. The embedded data can be recovered 
T-H by viewing the source of the html page. This technique can easily be extended 

sjjl to embed secret message inside any piece of source-code where the standard 

• ^ interpreter of that language is case-insensitive. 

X 

^ 1 Introduction 

Some techniques for hiding data in executables are already proposed (e.g.. Shin 
et al g]). In this paper we introduce a very simple technique to hide secret mes- 
sage bits inside source codes as well. We describe our steganographic technique 
by hiding inside html source as cover text, but this can be easily extended to any 
case-insensitive language source codes. Html Tags are basically directives to the 
browser and they carry information regarding how to structure and display the 
data on a web page. They are not case sensitive, so tags in either case (or mixed 
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case) are interpreted by the browser in the same manner (e.g., "< head >" and 
"< HEAD >" refers to the same thing). Hence, there is a redmidancy and 
we can exploit this redundancy. To embed secret message bits into html, if the 
cases of the tag alphabets in html cover text are accordingly manipulated, then 
this tampering of the cover text will be ignored by the browser and hence it 
will be imperceptible to the user, since there will not be any visible difference 
in the web page, hence there will not be any suspect for it as well. Also, when 
the web page is displayed in the browser, only the text contents are displayed, 
not the tags (those can only be seen when the user does 'view source'). Hence, 
the secret messages will be kind of hidden to user. 

Both redundancy and imperceptibility conditions for data hiding are met, 
we use these to embed data in html text. If we do not tamper the html text data 
that is to be displayed by the browser as web page (this html cover text is ana- 
logical to the cover image, when thought in terms of steganographic techniques 
in images [IIEIEI), the user will not even suspect about hidden data in text. We 
shall only change the case of every character within these Html tags (elements) 
in accordance with the secret message bits that we want to embed inside the html 
web page. If we think of the browser interpreter as a function, /s : E* — )■ E* we 
see that it is non-injective, i.e., not one to one, since fsix) = fB{y) whenever 
X e {'A' . . . 'Z'}, y G {'a' . . . 'z'} and Uppercase{y) = x. The extraction process 
of the embedded message will also be very simple, one needs to just do 'view 
source' and observe the case-patterns of the text within tags and can readily 
extract the secret message (and see the unseen), while the others will not know 
anything. 

The length (in bits) of the secret message to be embedded will be upper- 
limited by the sum of size of text inside html tags (here we don't consider at- 
tribute values for data embedding. In case we consider attribute values for data 
embedding, we need to be more careful, since for some tags we should think of 
case-sensitivity, e.g. <A HREF= "link.html" >, since link file name may be case- 
sensitive on some systems, whereas, attributes such as <h2 align= "center" > is 
safe). If less numbers of bits to be embedded, we can embed the information 
inside Header Tag specifying the length of embedded data (e.g. '<Header 25 >' 
if the length of secret data to be embedded is 25 bits) that will not be shown 
in the browser (optionally we can encrypt this integer value with some private 
key) . In order to guarantee robustness of this very simple algorithm one may 
use some simple encryption on the data to be embedded. 

2 The Algorithm for Embedding 

The algorithm for embedding the secret message inside the html cover text is 
very simple and straight-forward. First, we need to separate out the characters 
from the cover text that will be candidates for embedding, these are the case- 
insensitive text characters inside Html tags. Figure 2 shows a very simplified 
automata for this purpose. 

We define the following functions before describing the algorithm: 

• Z : E* — > E* is defined by, 

^ r ToLower{c) c e {'A;..'Z'} 1 ^^^^^ ToLower{c) = c + 32 
c otherwise J 
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• Similarly, u : S* — > E* is defined by, 

ToUpper{c) c e {'a'..'z'} 
c otherwise 



where ToUpper(c) = c — 32 



Here the aseii value of 'A' is 65 and that of 'a' is 97, with a difference of 32. 

It's easy to see that if the domain E* = {'a'..'z'} U {'A'..'Z'}, then 
I : {'A'..'Z'} {'a'..'z'} and u : {'a'..'z'} ^ {'A'..'Z'}, implies that /(.) = u(0 = 
S* - 

Now, we want to embed secret data bits bib2..bk inside the case-insensitive 
text inside the Html Tags. If ciC2 . . . c„ denotes the sequence of characters 
inside the html tags in cover text (input html) . A character Ci is a candidate for 
hiding a secret message bit iff it is an alphabet. If we want to hide the j*^^ secret 
message bit bj inside the cover text character c^, the corresponding stego-text 
will be defined by the following function fstego- 

e {'a'..'z'} U {'A'..'Z'}, i.e. if IsAlphabet(ci) is true, 

Jstego[C^) - I ^^^^^ 

Hence, we have the following: 



bj = 
6, = 1 



c, e{'a'..'z'}U{'A'..'Z'}^/, 



stego ) 



l{ci).bj + u{ci).bj^ yi 



(1) 



Number of bits (fc) of the secret message embedded into the html cover text 
must also be embedded inside the html (e.g., in Header element). The figure 1 
and the algorithm [T] together explain this embedding algorithm. 



<head> 

<title> 
Hello World 

<ie> 
4ead> 
<bocly> 

<M>Hi</h1> 

<p>Helloworld</p> 
</bocl¥> 
</html> 



Bit Stream to be 
embedded ifitlie 
Web Page 



1011010000110 
1100110100101 
100111000010 



<HtML> 
<ed38> 

<tiTLe> 
Hello World 

€E> 
</HeAd> 
<bOdY> 

<H1>Hi<;/h1> 

<p>Helloworld</P> 
<jBOdy> 



Total number of bits to embed = 38 Output 



< h t m I > Inpyt Cover Data 
1011 Secret Data to be embedded 

V V V 11 

< H t M L > Output Stego Data 



Figure 1: Illustration of how the data hiding works 



3 The Algorithm for Extraction 

The algorithm for extraction of the secret message bits will be even more simple. 
Like embedding process, we must first seperate out the candidate text (text 
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Algorithm 1 Embedding Algoritlim 



1 


bearcli tor all tlie html tags present m tlie Irtmi cover text and extract ail 




jll i C ''Ijl 1 'Jl "T^ 1 "•, A 1 '1 1 ■ 

tlie cnaracters ciC2 . . . trom mside ttiose tags usmg ttie ui'A described m 




tlie figure 2. 


2 


Embed the secret message length k inside html header in the stego text. 


3 


j ^ 0. 


4 


for c, e HTM LT AGS, i = l...ndo 


5 


if Q e {'a' . . . 'z'} U {'A' . . . 'Z'} then 


6 


fstego{Ci) = l{Ci).bj +u{ci).bj. 


7 




8 


else 


9 


fstegoi^i) 


10 


end if 


11 


if j == k then 


12 


break. 


13 


end if 


14 


end for 



within tags) that were chosen for embedding secret message bits. Also, we must 
extract the number of bits (k) embedded into this page (e.g., from the header 
element). One has to use 'view source' to find out the stego-text. 

Now, we have d, = fstegoic,), G {1, 2, . . . , n}. If d, £ {'a'..'z'} U {'A'..'Z'} 
i.e., an alphabet, then only it is a candidate for decoding and to extract hi from 
di, we use the following logic: 
Jo G {'a' ...'z'} \ 
"^-j 1 G{A' ...'Z'} / 

Repeat the above algorithm Vi < fc, to extract all the hidden bits. The figure 
2 and the algorithm [2] together explain this embedding algorithm. 
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Is fa') 



Js("<html>") 

=fsr<HTML>") 

=Js/'"<HlmL>") 



Bwmer finictiQiifs for to'fs hisick' Html Tag 



Skip Attributes 




UFA to extract the text imide Html Tags 



EmbeddingAlgonlhm 

^fi-^, = characters insidethe html tags in 
cover teKl (Inputhtml) 

iijij . Jt E Sequence of secret message bits to 
embed inside the cover text. 



ifci£{'a' V}u{'i' 7'), 



(not a candidate for embedding secret bit h) 



ExtraclionAlgonitim 



fi/i(ij . J, s cliaracters inside the html tags 
instegotext 



1 J,e|'A'„;Zl 



/(,) = ToLuw?rCas( 
uQsToVpperCass 



Figure 2: Basic block-diagram for html data- hiding technique 
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<htiiLL> 
<head> 

<ticle> Test HlidI (Steganogrsphy - Seeing the Un3een|K/citle> 

</head> 

<bQdy> 

<tdl;le to::dsr="l'^ csll3pacir.g="5'^ cel-paddLr.g="10"> 

<tT> 

<td>ir±3p; </td> 

<td col3par.='^2''> <hl> Embedding Secret Message , can you see what it is</hlx/td> 

</tT> 

<fr> 

<td :;ow3p5.r.='^2'' ali^r.='^ieft'^ valL?y.="ccp"> 

<p> Image Processing TechniqTies<br> 
Histogram eq'L3alizatiQn<br> 

Bilevel Ihre3hQlding<br> 

Cptiraal Thre3hQlding<br> 

LOG Filter<br> 

DFT DCT DWI<br> 

Segment at iQn<bi> 

JPEG Corepression </p></td> 
<tdxhi>The Lena Iraage - we don't embed secret message inside Lena! ! !</hlx/td> 

<td aligr.="center"' valigr.=''middle"'> <irag 3rc=''images/lena.bn]p" alt="Lena" widtr.='^256" r.eicl-.t=''12S" border="0"> 

</td> 
</ti> 
<tr> 

<td alicT".="right'' valicr.='^bottcrfL"> <ir£g 3::c=''irfiages/lena.brr5:'' aLt='^Lena'' ■vldtr.=''256'^ r.slg'r.t="125" bo:der='^Q'^> 
</td> 

<td> <p>Lena is an image that is used to test many of the classical image processing 

technigues and algoriffiiDS including histogram eqfjalization, optimal thresholding 
<b>LSE data hiding</b> 

(LSB data hiding techniq^^e changes the LS3 s of the image data to hide secret message); 
<b>Steganography</b> 
ISteganography is seeing the 'anseen) 
<b>Digital Watermarl!:ing</h> 

(But here we do not eiBbed inside the image) . 

Instead we use case insensitivity of html tags to embed o^ar secret message .</p></td> 

</tr> 
-<tr> 

<td colspar="3"' aligr.='^left"' valicr='^tcp">Embed secret message inside 

<b>coYer text</b> to get 
<b>3tego text</b> and be happy since both the stego html and the cover html look 

exactly identical in the browser. No one can even g'aess that there is some 

secret message already embedded inside it.</td> 

</tr> 
<tr> 

<td colspar.=''3'' aiigr.='^Ieft'^ valigr.='^tcp">Finallyf extraction of the secret message is also 
<b>easy</b> (just you need to do) 

<b>view source</t> (and extract the message) . Just checking the cases of the html tags reveals 
the embeded message. Can make this siicpie technicjue even more robust by using some 
sirrple encr^f-pticn technitr^e in addition</td> 

■</tr> 
</table> 

<p>Sgt; <a href=''index.htm"'>3teganography exaii!ple3</ aX/p> 

</body> 

</html> 

Cover Html Source 



Figure 3: Cover Html source before embedding the secret message 
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<t;IT1E> Teat Htnl (Ste^anograpiiy - 


Seeing the Unseen) </T3IlE> 




</'HSad> 












<I3i31e 3:.nEI:-"!" cEil3ElClES-"S" cHLlpAriDi;fS-"l!)<'> 










<td>ir.t3F; </t±> 






<TD Cc'Ljpa£~"2"> <^1> Eflibedding Secret Kisssaq^f can ^au see vhat it 


i3</hlx/td> 














itd ro«spa!I-"2" align-'left" vJlll5E-'top»> 




<P> Image ErocessiDg TechDi(iue3<Br> 




HisCDgiiaiii equalization<3E> 




Biitvsi iRie»rtaiaiiia<brj 












LOG Fllter<Br> 






on OCT I!«T<BR> 












JTES cuEsKcjsisa </Pj</ld> 






<^D>C!jL>The LeDA Xsa^e - don'^: uo^d SQct#t ntSflA^e lEipid^ Lqela^ 


! !i/hl>i/cP> 


<td iiLrgiJ""cftiLter** yaLi(jli-"niiddle*'> <Jjsq src^^unagea/lena.bnp" ALr* 


■Lena- Hldlh-'iS*" »■ iSST-"i.2e- b=rIer-"3-> 








</tr> 






■!It> 






<id JamM-'Tiata" v»llall""boci;oiii"> <1MG SRo-'tnagea/lena.laip" alt"" 


leos" wia!:h»"!S«" hel^fct-naS" fcoraer-'O'J 


</rd> 






<T;d> <p>Lenii ia an image that is used to teat mflEy of the claaaical 


image processing 


techniques a:'.d al^oEithniff inci'jdiiig hiatog^am eguali^at 


icn, Dptiioai thresholding 


<b>L£3 data hidir.s</t5- 






(ISB (M51 Siatng lecBJUijtue cfisnge* Ttit Lsa s al tfie msge aa:* cd suss stcztz is«$sag«) ; 


<b>5teganographv<y'b> 






(Stegancgrap^.y is aeemg the unaeen) 




<b>Digical Wate:iiui:kii:g</b> 




(Sue here we da r.ot eisfeed inside the ina^e) . 




iBScesa «e use =»« icssRsxcxvici- et staa tag* co eiiiijea 


our secret aieisafle.<i'p3</ta> 


</tr> 






<t;r> 






-<td ccl3pai:="3'' aliqr.= "ieft" vali5rj=*'tQp">E!Tibsd secret raesaa^re ir.side 


<b>cover text</l?> to get 






<ia?sTeoo leiti/es sua c« S4pbv 


since Boct! Che steoo ftiBX ina t!ie cover ncni looit 


tKflc&ll' idencical in t^ie browser . Ha one an even guess chat 


there is sacae 


secret i&essage already embedded laside it.</cd> 




</tr> 






<tr> 






<ia coljpas""?" iii.gr-""le£t" Taiicr.-'tcp'sristlli-, eKtraction ol ise secret snesasoe is also 


<b>easv';/b> (}tist yau r.eed to da) 




<b>vitw j&Lirce</b> jaad extract the EtssageJ . ouac checlclng the cases of the html tags revtala 


the eaibeded Mssage, Can nake this siiqile technique even nor 


e robust by using sonre 


ainipie encr^'ption technique iu 3dditic!:</tdl> 




</tr> 






■t/cable> 






<p>lgt; <a href=''indBK.htjB">StegaBagraphy eia]n>le3</a></p> 




</fcadi-> 






</hcml> 






Stego Html Source 




Secret Messaye emhedded 






"Copyright @ Sandipan Dey" 




Embedded BitStreiiiii (length 200) 


0100001 101 101 11 101 11000001 11 100101 11001001 10100101 1001 1101 10100001 1101000010000001000000001000000101 


001 101 10000101 101 11001 10010001 10100101 11000001 10000101 101 110001000000100010001 10010101 11 100100000000 



Figure 4: Stego Html source after embedding the secret message 
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I (StegdiTogriphy - Seeing the Unseen) - Windows Internet Explorer 



li Cilstego.h 



J Microsoft SI Live Search 



File Edit View Favorites Tools Help 



^Favorites ^ ^ Suggested Sites ' ^HSN ^ My MSN f MSN Money ^ MSN Entertaiiment '•kl'ISNBC |J Free Hotmail |J Live Search Traffic |] MSN SlideShow - 

ft ' S ft ' Page- Safety Tools - §- 



^ Test Html (Steganography ■ Seeing the Unseen) 



Embedding Secret Message, can you see what it is 



Proclsfcg The Lena Image - we 
don't embed secret 



message inside Lena!!! 

Hiresholdiis; 

Optimal — 

riiresholdiiig 
LOGHter" 
DFTDCT 
dWI 

Segmentatioii 
JPEG 

Conp'essioii 





Lena is an image tiiat is used to test many of the classical image processing 
techniques and algorithms including histogram equalization, optmal thresholding 
LSB data hiding iJ,SB data lidiiig technique changes tlie LSB s of the image 
data to hide secret message); Stegaaograpliy (Steganography is seeing the 
unseen) Digital "fl atermarking (But here we do not embed inside the image). 
Instead we use case insensitiiit}' of html tags to embed our seaet message. 



Embed secret message inside cover text to get sfego teit and be happy since hA the stego htal and the cover html look esacfh' identical in 
No one can even guess that there is some secret message aieadv embedded inside i 



Final)', extraction of the secret message is also eisT (just you need to do) view source (and extract the message). Just checkiig the cases 
re\'ea]s the embeded message. Can make this simple technique even more robust by using some simple encryption technique in addition 



> Steganography examples 



3one 



jj My Computer 



Figure 5: Cover & Stego Html 
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Algorithm 2 Extraction Algoritiini 



Searcli for all the html tags present in the html stego text and extract all 
the characters did2 . . .dn from inside those tags using the DFA described in 
the figure 2. 

Extract the secret message length k from inside html header in the stego 
text. 
j ^ 0. 

for d, e HTM LT AGS, i = 1 . . . n do 
if di S {'a' . . . 'z'} then 



3: 

4: 

5: 

6: 

7: 

8: 

9: 
10: 
11: 
12: 
13: 
14: 

15: RTlH for 



b, = 0. 
j ^ j + 1- 

else if di E {A' . . . 'Z'} then 
b, = 1. 

j ^ j + 1- 
end if 

if j == k then 

break, 
end if 



2Sfl 

> 

J 

U 

1) 150 



■ Covet h 

■ Stego h 



1 5 5 ; 9 11 IS 15 17 19 11 !3 !5 27 29 31 3S 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 E 

ASCII Values ^ 'A' 



87 89 91 93 95 97 99 101103105107109111113 

... T 'a' 'z' 



Figure 6: Cover vs Stego Html Histogram 



Figures 3, 4 and 5 show an example of how our method works, while Figure 
6 shows the comparison of the histogram of the cover html and stego html in 
terms of the (ascii) character frequencies. Classical image hiding techniques like 
LSB data hiding technique always introduce some (visible) distortion [51 [TU] in 
the stego image (that can be reduced using techniques [3 [71 [HI [5]) j but our 
data hiding technique in html is novel in the sense that it introduces no visible 
distortion in stego text at all. 
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4 Conclusions 



In this paper we presented an algorithm for hiding data in html text. This 

technique can be extended to any case-insensitive language and data can be 
embedded in the similar manner, e.g., we can embed secret message bits even in 
source codes written in languages like basic or pascal or in the case-insensitive 
sections (e.g. comments) in C like case-sensitive languages. Data hiding meth- 
ods in images results distorted stego-images, but the html data hiding technique 
does not create any sort of visible distortion in the stego html text. 
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